A Novel Class Imbalance Learning Method using Subset Filtering

نویسندگان

  • K. Nageswara Rao
  • D. Rajya Lakshmi
چکیده

In many real-world applications, the problem of learning from imbalanced data (the imbalanced learningproblem) is a relatively new challenge that has attracted growing attention from both academia and industry. The imbalanced learning problem is concerned with the performance of learning algorithms in the presence of underrepresented data and severe class distribution skews. Due to the inherent complex characteristics of imbalanced data sets, learning from such data requires new understandings, principles, algorithms, and tools to transform vast amounts of raw data efficiently into information and knowledgerepresentation.In this paper, we present a new hybrid subset filtering approach for learning from skewed trainingdata. This algorithm provides a simpler and faster alternative by using C4.5 as base algorithm. We conduct experiments usingeleven UCI data sets from various application domains using f0ur base learners,and five evaluation metrics. Experimentalresults show that our method has higher Area under the ROC Curve, F-measure, precision, TP rate and TN rate values than many existing class imbalance learning methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Breast Cancer Diagnosis from Perspective of Class Imbalance

Introduction: Breast cancer is the second cause of mortality among women. Early detection is the only rescue to reduce the risk of breast cancer mortality. Traditional methods cannot effectively diagnose tumor since they are based on the assumption of well-balanced dataset.. However, a hybrid method can help to alleviate the two-class imbalance problem existing in the ...

متن کامل

Discriminative Sparse Neighbor Approximation for Imbalanced Learning

Data imbalance is common in many vision tasks where one or more classes are rare. Without addressing this issue, conventional methods tend to be biased toward the majority class with poor predictive accuracy for the minority class. These methods further deteriorate on small, imbalanced data that have a large degree of class overlap. In this paper, we propose a novel discriminative sparse neighb...

متن کامل

A comparative study on rough set based class imbalance learning

This paper performs systematic comparative studies on rough set based class imbalance learning. We compare the strategies of weighting, re-sampling and filtering used in the rough set based methods for class imbalance learning. Weighting is better than re-sampling, and re-sampling is better than filtering. The weighted rough set based method achieves the best performance in class imbalance lear...

متن کامل

A Novel One Sided Feature Selection Method for Imbalanced Text Classification

The imbalance data can be seen in various areas such as text classification, credit card fraud detection, risk management, web page classification, image classification, medical diagnosis/monitoring, and biological data analysis. The classification algorithms have more tendencies to the large class and might even deal with the minority class data as the outlier data. The text data is one of t...

متن کامل

An Optimized Cost-Free Learning Using ABC-SVM Approach in the Class Imbalance Problem

In this work, cost-free learning (CFL) formally defined in comparison with cost-sensitive learning (CSL). The primary difference between them is that even in the class imbalance problem, a CFL approach provides optimal classification results without requiring any cost information. In point of fact, several CFL approaches exist in the related studies like sampling and some criteriabased approach...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012